Exploit Temporal Locality of Shared Data in SRC Enabled CMP

نویسندگان

  • Haixia Wang
  • Dongsheng Wang
  • Peng Li
  • Jinglei Wang
  • XianPing Fu
چکیده

By run-time characteristic analysis of parallel workloads, we found that a majority of shared data accesses of parallel workload has temporal locality. Based on this characteristic, we present a sharing relation cache (SRC for short) based CMP architecture, saving recently used sharing relations to provide destination set information for following cache-to-cache miss requests. Token-SRC protocol integrates SRC into token protocol,reducing network traffic of token protocol.Simulations using SPLASH-2 benchmarks show that, a 16-core CMP system with tokenSRC achieved average 15% network traffic reduction of that with token protocol.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards zero latency photonic switching in shared memory networks

Photonic networks-on-chip based on silicon photonics have been proposed to reduce latency and power consumption in future chip multi-core processors (CMP). However, high performance CMPs use a shared memory model which generates large numbers of short messages, creating high arbitration latency overhead for photonic switching networks. In this paper we explore techniques which intelligently use...

متن کامل

Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?

On Chip Multiprocessors (CMP), it is common that multiple cores share certain levels of cache. The sharing increases the contention in cache and memory-to-chip bandwidth, further highlighting the importance of data locality analysis. As a rigorous and hardware-independent locality metric, reuse distance has served for a variety of locality analysis, program transformations, and performance pred...

متن کامل

CASH: Revisiting Hardware Sharing in Single-Chip Parallel Processors

As the increasing of issue width has diminishing returns with superscalar processor, thread parallelism with a single chip is becoming a reality. In the past few years, both SMT (Simultaneous MultiThreading) and CMP (Chip MultiProcessor) approaches were first investigated by academics and are now implemented by the industry. In some sense, CMP and SMT represent two extreme design points. In thi...

متن کامل

Program Transformations for Cache Locality Enhancement on Shared - memory

Program Transformations for Cache Locality Enhancement on Shared-memory Multiprocessors Naraig Manjikian Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 1997 This dissertation proposes and evaluates compiler techniques that enhance cache locality and consequently improve the performance of parallel applications on shared-memory multiprocesso...

متن کامل

Exploiting Temporal and Spatial Constraints on Distributed Shared Objects

The advent of gigabit network technologies has made it possible to combine sets of uni-and multiprocessor workstations into a distributed, massively-parallel computer system. Middleware, such as distributed shared objects (DSO), attempts to improve programmability of such systems, by providing globally accessible 'object' abstractions. Early research on distributed shared object systems concern...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007